书签分享收藏举报版权申诉 / 13

立即下载加入VIP,免费下载

当前位置：首页 > 初中教育 > 学科竞赛 > 完整版计算机语言python100道pandas含答案文档格式.docx

完整版计算机语言python100道pandas含答案文档格式.docx

文档编号：19903269
上传时间：2023-01-12
格式：DOCX
页数：13
大小：21.23KB

《完整版计算机语言python100道pandas含答案文档格式.docx》由会员分享，可在线阅读，更多相关《完整版计算机语言python100道pandas含答案文档格式.docx（13页珍藏版）》请在冰豆网上搜索。

完整版计算机语言python100道pandas含答案文档格式.docx

c'

d'

e'

f'

g'

h'

i'

j'

df=pd.DataFrame（data,index=labels）

5.DisplayasummaryofthebasicinformationaboutthisDataFrameanditsdata.

[5]:

df.info（）

#...or...

df.describe（）

6.Returnthefirst3rowsoftheDataFramedf

[6]:

df.iloc[:

3]

#orequivalently

df.head（3）

7.Selectjustthe'

and'

columnsfromtheDataFramedf.

[7]:

df.loc[:

['

]]

#or

df[['

8.Selectthedatainrows[3,4,8]andincolumns['

].

df.loc[df.index[[3,4,8]],['

9.Selectonlytherowswherethenumberofvisitsisgreaterthan3.

[4]:

df[df['

]>

3]

10.Selecttherowswheretheageismissing,i.e.isNaN.

].isnull（）]

11.Selecttherowswheretheanimalisacatandtheageislessthan3.

df[（df['

]=='

）&

（df['

]<

3）]

12.Selecttherowstheageisbetween2and4（inclusive）.

].between（2,4）]

13.Changetheageinrow'

to1.5.

[

]:

df.loc['

]=1.5

14.Calculatethesumofallvisits（thetotalnumberofvisits）.

df['

].sum（）

15.Calculatethemeanageforeachdifferentanimalindf.

[8]:

df.groupby（'

）['

].mean（）

16.Appendanewrow'

k'

todfwithyourchoiceofvaluesforeachcolumn.Thendeletethatrowtoreturnthe

originalDataFrame.

]=[5.5,'

2]

#andthendeletingthenewrow...

df=df.drop（'

）

17.Countthenumberofeachtypeofanimalindf.

[9]:

].value_counts（）

18.Sortdffirstbythevaluesinthe'

indecendingorder,thenbythevalueinthe'

visit'

columnin

ascendingorder.

[10]:

df.sort_values（by=['

],ascending=[False,True]）

19.The'

columncontainsthevalues'

.Replacethiscolumnwithacolumnofboolean

values:

shouldbeTrueand'

shouldbeFalse.

]=df['

].map（{'

True,'

False}）

[14]:

].replace（'

python'

print（df）

21.Foreachanimaltypeandeachnumberofvisits,findthemeanage.Inotherwords,eachrowisananimal,

eachcolumnisanumberofvisitsandthevaluesarethemeanages（hint:

useapivottable）.

[15]:

df.pivot_table（index='

columns='

values='

aggfunc='

mean'

22.YouhaveaDataFramedfwithacolumn'

A'

ofintegers.Forexample:

df=pd.DataFrame（{'

[1,2,2,3,4,5,5,5,6,7,7]}）

Howdoyoufilteroutrowswhichcontainthesameintegerastherowimmediatelyabove?

[16]:

df.loc[df['

].shift（）!

=df['

#Alternatively,wecouldusedrop_duplicates（）here.Note

#thatthisremoves*all*duplicatesthough,soitwon'

t

23.GivenaDataFrameofnumericvalues,say

df=pd.DataFrame（np.random.random（size=（5,3）））#a5x3frameoffloatvalu

es

howdoyousubtracttherowmeanfromeachelementintherow?

df.sub（df.mean（axis=1）,axis=0）

24.SupposeyouhaveDataFramewith10columnsofrealnumbers,forexample:

df=pd.DataFrame（np.random.random（size=（5,10））,columns=list（'

abcdefghij'

））

Whichcolumnofnumbershasthesmallestsum?

（（Findthatcolumn'

slabel.）

[17]:

df.sum（）.idxmin（）

25.HowdoyoucounthowmanyuniquerowsaDataFramehas（i.e.ignoreallrowsthatareduplicates）?

len（df）-df.duplicated（keep=False）.sum（）

#orperhapsmoresimply...

len（df.drop_duplicates（keep=False））

26.YouhaveaDataFramethatconsistsof10columnsoffloating--pointnumbers.Supposethatexactly5

entriesineachrowareNaNvalues.ForeachrowoftheDataFrame,findthecolumnwhichcontainsthethird

NaNvalue.

（YoushouldreturnaSeriesofcolumnlabels.）

（df.isnull（）.cumsum（axis=1）==3）.idxmax（axis=1）

27.ADataFramehasacolumnofgroups'

grps'

andandcolumnofnumbers'

vals'

.Forexample:

list（'

aaabbcaabcccbbc'

）,

[12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]}）

grp'

].nlargest（3）.sum（level=0）

28.ADataFramehastwointegercolumns'

B'

.Thevaluesin'

arebetween1and100（inclusive）.For

eachgroupof10consecutiveintegersin'

（i.e.（0,10],（10,20],...）,calculatethesumofthe

correspondingvaluesincolumn'

.

df.groupby（pd.cut（df['

],np.arange（0,101,10）））['

29.ConsideraDataFramedfwherethereisanintegercolumn'

X'

[7,2,0,3,4,2,5,0,3,4]}）

Foreachvalue,countthedifferencebacktothepreviouszero（orthestartoftheSeries,whicheveriscloser）.

Thesevaluesshouldthereforebe[1,2,0,1,2,3,4,0,1,2].Makethisanewcolumn'

Y'

izero=np.r_[-1,（df['

]==0）.nonzero（）[0]]#indicesofzeros

idx=np.arange（len（df））

]=idx-izero[np.searchsorted（izero-1,idx）-1]

30.ConsideraDataFramecontainingrowsandcolumnsofpurelynumericaldata.Createalistoftherowcolumnindexlocationsofthe3largestvalues.

df.unstack（）.sort_values（）[-3:

].index.tolist（）

31.GivenaDataFramewithacolumnofgroupIDs,'

andacolumnofcorrespondingintegervalues,

'

replaceanynegativevaluesin'

withthegroupmean.

defreplace（group）:

mask=group<

group[mask]=group[~mask].mean（）

returngroup

df.groupby（['

]）['

].transform（replace）

32.Implementarollingmeanovergroupswithwindowsize3,whichignoresNaNvalue.Forexampleconsider

thefollowingDataFrame:

>

df=pd.DataFrame（{'

group'

aabbabbbabab'

）,

value'

[1,2,3,np.nan,2,3,

np.nan,1,7,3,np.nan,8]}）

df

groupvalue

0a1.0

1a2.0

2b3.0

3bNaN

4a2.0

5b3.0

6bNaN

7b1.0

8a7.0

9b3.0

10aNaN

11b8.0

ThegoalistocomputetheSeries:

01.000000

11.500000

23.000000

33.000000

41.666667

53.000000

63.000000

72.000000

83.666667

92.000000

104.500000

114.000000

g1=df.groupby（['

]#groupvalues

g2=df.fillna（0）.groupby（['

]#fillna,thengroupvalues

s=g2.rolling（3,min_periods=1）.sum（）/g1.rolling（3,min_periods=1）.count（）#comp

s.reset_index（level=0,drop=True）.sort_index（）

33.CreateaDatetimeIndexthatcontainseachbusinessdayof2015anduseittoindexaSeriesofrandom

numbers.Let'

scallthisSeriess.

dti=pd.date_range（start='

2015-01-01'

end='

2015-12-31'

freq='

）

s=pd.Series（np.random.rand（len（dti））,index=dti）

34.FindthesumofthevaluesinsforeveryWednesday

s[s.index.weekday==2].sum（）

35.Foreachcalendarmonthins,findthemeanofvalues.

s.resample（'

M'

）.mean（）

36.Foreachgroupoffourconsecutivecalendarmonthsins,findthedateonwhichthehighestvalue

occurred.

s.groupby（pd.TimeGrouper（'

4M'

））.idxmax（）

37.CreateaDateTimeIndexconsistingofthethirdThursdayineachmonthfortheyears2015and2016.

pd.date_range（'

2016-12-31'

WOM-3THU'

38.SomevaluesinthetheFlightNumbercolumnaremissing.Thesenumbersaremeanttoincreaseby10witheachrowso10055and10075needtobeputinplace.Fillinthesemissingnumbersandmakethecolumnan

integercolumn（insteadofafloatcolumn）

]df['

FlightNumber'

].interpolate（）.astype（int）

39.TheFrom_Tocolumnwouldbebetterastwoseparatecolumns!

Spliteachstringontheunderscore

delimiter_togiveanewtemporaryDataFramewiththecorrectvalues.Assignthecorrectcolumnnamesto

thistemporaryDataFrame.

temp=df.From_To.str.split（'

_'

expand=True）

temp.columns=['

From'

To'

40.NoticehowthecapitalisationofthecitynamesisallmixedupinthistemporaryDataFrame.Standardise

thestringssothatonlythefirstletterisuppercase（e.g."

londON"

shouldbecome"

London"

.）

temp['

]=temp['

].str.capitalize（）

41.DeletetheFrom_TocolumnfromdfandattachthetemporaryDataFramefromthepreviousquestions.

From_To'

axis=1）

df=df.join（temp）

42.IntheAirlinecolumn,youcanseesomeextrapuctuationandsymbolshaveappearedaroundtheairline

names.Pulloutjusttheairlinename.E.g.'

（BritishAirways.）'

shouldbecome'

British

Airways'

.

Airline'

].str.extract（'

（[a-zA-Z\s]+）'

expand=False）.str.strip（）

#note:

using.strip（）getsridofanyleading/trailing

43.IntheRecentDelayscolumn,thevalueshavebeenenteredintotheDataFrameasalist.Wewouldlikeeach

firstvalueinitsowncolumn,eachsecondvalueinitsowncolumn,andsoon.Ifthereisn'

tanNthvalue,the

valueshouldbeNaN.

ExpandtheSeriesoflistsintoaDataFramenameddelays,renamethecolumnsdelay_1,delay_2,

etc.andreplacetheunwantedRecentDelayscolumnindfwithdelays.In

delays=df['

RecentDelays'

].apply（pd.Series）

delays.columns=['

delay_{}'

.format（n）forninrange（1,len（delays.columns）+1）]

axis=1）.join（delays）

44.Giventhelistsletters=['

C'

]andnum

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 完整版计算机语言 python100 pandas 答案

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：完整版计算机语言python100道pandas含答案文档格式.docx
链接地址：https://www.bdocx.com/doc/19903269.html

完整版计算机语言python100道pandas含答案文档格式.docx

热门标签